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We develop in more detail our reweighting method for incorporating new datasets in 
parton fits based on a Monte Carlo representation of PDFs. After revisiting the derivation 
of the reweighting formula, we show how to construct an unweighted PDF replica set which 
^ ' is statistically equivalent to a given reweighted set. We then use reweighting followed by 

' unweighting to test the consistency of the method, specifically by verifying that results do 

not depend on the order in which new data are included in the fit via reweighting. We 
apply the reweighting method to study the impact of LHC W lepton asymmetry data on 
the NNPDF2.1 set. We show how these data reduce the PDF uncertainties of light quarks 
in the medium and small x region, providing the first solid constraints on PDFs from LHC 
data. 
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1 Introduction 



In a series of previous papers [IHS]) we constructed increasingly accurate sets of parton 
distributions (PDFs), using a Monte Carlo approach coupled to the use of neural networks 
as underlying interpolating functions. By definition, a PDF set provides a representation 
of a probability density in the space of parton distributions, i.e. a probability density in 
a space of functions [T0Hl2j . We have performed various tests that confirm that NNPDF 
parton sets do indeed behave in a way which is consistent with the desired statistical 
properties of functional probability densities. 

An advantage of providing a Monte Carlo representation of this PDF probability den- 
sity is that new information (such as might be provided by new experimental data) can 
be included, using Bayes' theorem, by reweighting an existing PDF set, without having 
to perform a new PDF fit |131ll4j: it is possible to determine a reweighting factor for 
each Monte Carlo replica in such a way that the information contained in the new data 
is included by simply computing weighted averages. This approach was first successfully 
developed and implemented in Ref. [H], where it was explicitly shown, in studies involv- 
ing CDF and DO inclusive jet data, that results obtained by reweighting are equivalent to 
those found by including the new data in the fit. 

Reweighting takes a set of equally likely PDF replicas generated by importance sam- 
pling, and assigns to them weights reflecting their relative probabilities in the light of new 
data not included in the original fit. In this paper we develop a second technique, which 
we call 'unweighting', which takes the reweighted set and replaces it with a new set of 
replicas which are again all equally probable. This new set of replicas can then be used 
in precisely the same way as a fitted set. Even though no new information is gained by 
unweighting, presenting reweighted PDFs in the same form as a corresponding refitted set 
has various obvious practical advantages. 

Furthermore, unweighting allows us to perform a highly nontrivial test of the reweight- 
ing procedure: namely, we take two new independent datasets, and use them to sequen- 
tially improve an existing set of replicas. This may then be done in either order, or indeed 
by treating them as one (combined) dataset. All three methods should yield equivalent 
results. Checking that this is the case provides a strong test of the method. However this 
can only be done if after each reweighting we unweight, because our simple closed-form 
expression for the weights can only be used for the reweighting of an equally probable (i.e. 
unweighted) set of PDFs. 

We perform this check by first taking the NNPDF2.0 NLO DIS-hDY fit jTj, based 
on deep-inelastic and Drell-Yan data only, and taking as new datasets the CDF [TS] and 
DO [16] Run II inclusive jet data. This completes and refines the studies of Ref. |14) . 
where it was verified that the inclusion of the combined CDF+DO jet data by reweighting 
or refitting gives equivalent results. We then perform a second check using as the prior the 
NNPDF2.1 DIS fit [8j, based on deep-inelastic data only, and taking as new datasets the 
E605 Drell-Yan and Tevatron inclusive jet data. This provides a somewhat different 
test, because while the DO and CDF data used in the previous test measure the same 
observable in the same kinematic region, the Drell-Yan and jet data affect different PDFs 
in different kinematic regions. 

Besides its practical usefulness, the combined reweighting plus unweighting procedure 
is important because it allows one, at least in principle, to perform a global PDF fit by 
sequentially including new data by reweighting a generic prior distribution of PDFs p!3] . If 
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the information contained in the new data is sufficiently precise, and the prior distribution 
sufficently broad, the results will the be largely independent of the prior one starts from: 
this would then give completely unbiased PDFs. In practice, this procedure is unlikely to 
be viable because, in order to get accurate results, the prior set of PDF replicas would 
have to be huge. However, the equivalence of PDFs obtained from reweighting with those 
determined using a fitting procedure (such as the NNPDF sets) conffi'ms that the latter 
are also unbiased. 

Following the success of these consistency tests, we use reweighting to evaluate the 
impact on the NNPDF2.1 NLO fit Ref. [8] of recent LHC data on the VF-lepton asymmetry 
from the ATLAS and CMS collaborations. Using unweighting, we are able to produce a 
new PDF set, NNPDF2.2, which incorporates the effect of these data and the older W- 
lepton asymmetry from DO. 

The outline of this paper is as follows. In Sec. [2] we revisit the derivation of the 
reweighting method, in particular the determination of the weights in terms of the 
of the fit of the new data to each replica, and we discuss some subtle issues that were 
not tackled in Ref. |14] , related to the definition of the measure in data space and to the 
inclusion by reweighting of multiple data sets. Then, in Sec. [3] we present our method of 
unweighting reweighted PDF sets, to give a set of replicas which are all equally probable, 
and show that indeed the unweighted set is equivalent to the original reweighted set. 
We follow this in Sec. S] with a study of the consistency of the combined reweighting 
and unweighting procedure, when applied to more than one dataset in turn. After this 
theoretical study, we turn to phenomenology by using the method to investigate the impact 
of LHC measurement of the W lepton asymmetry on PDFs. First, we show in Sec. [5] 
how these data reduce the PDF uncertainties of light quarks in the medium and small-x 
region, providing the first solid constraints on PDFs from LHC data, and then in Sect. [6] 
we construct a new set of NLO PDFs, NNPDF2.2, which includes, on top of all the data 
used to determine NNPDF2.1 PDFs, also the DO W asymmetry data already discussed in 
Ref. jl4j and the LHC data discussed in Sect. \5\ 
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2 Reweighting 



In this section we revisit the derivation of the weight formula for reweighting ensembles 
of PDFs. In particular we discuss some of the more subtle issues in the formal proof 
presented in Ref . [14J . The derivation of the formula for the computation of the weights is 
nontrivial because we are dealing with probability densities in multidimensional spaces. In 
particular we need to avoid the ambiguities that can appear when dealing with conditional 
probabilities with respect to an event of probability zero, the so-called Borel-Kolmogorov 
paradox [18]. The conditional probabilities need to be defined carefully as integrations of 
conditional probability densities over finite volumes, in the limit when these volumes are 
taken to zero. 

2.1 Integration over the data space 

Bayes' theorem can be stated in terms of probability densities: 

V{f\y)VfV{y)d^y = V{y\f)<PyV{f)Vf , (1) 

where P/ is the integration measure in the space of PDFs, and (F'y is the integration 
measure in the space of data. The latter is an n-dimensional real space, where n is the 
number of data points used for reweighting. V{f) is the prior density in the space of the 
PDFs: it is represented by the set {fk\ of PDF replicas. These are all equally probable, 
e.g., the expected PDF is simply determined as the average over the set {/fc}, and are 
determined by importance sampling by starting from experimental data [11] . V{f\y) is 
instead the new probability density, given the n data points y. Note that here, unlike in 
Ref. [14j, we do not make explicit the dependence of conditional probabilities on generic 
prior information K (which includes the data used to determine the prior PDF, external 
parameters such as a^, and theoretical assumptions such as the use of perturbative QCD at 
a given order). V{y) is the prior density in the space of data, and we do not need to specify 
its explicit form, since it can be fixed by requiring V{f\y) to be correctly normalised. The 
only relevant property of V{y) is that it does not depend on the PDFs /. 

In order to define the probability density V{f\y) at a given point y, we can integrate 
Eq. ([1]) in a small sphere of radius e centered at y. Integrating the left-hand side of 
Eq. ([1]) over we obtain 

/ V{f\y')VfV{y')d^y'=[n-h^n^V{y)] V{f\y)Vf , (2) 

where r2„ is the solid angle in n dimensions. Integrating the right-hand side similarly, we 
can cancel the volume factors on each side and thus take the limit e ^ 0, to give 

V{f\y)Vf = '^^V{f)Vf. (3) 

Now V{y\f) is the likelihood density for the data y: assuming these data to be normally 
distributed about central values y[f] (which of course depend on the PDF /), 

V{y\f) = (2^)-"/2(deta)-i exp [-\{y- y[f])a-\y - y[f])) , (4) 
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where a is the experimental covariance matrix. The only dependence on / is through the 
value of 

x\yJ)^{y-y[f])<y-\y-y[f])- (5) 

It now follows from Eqs. dSlE]) that 

V{f\y)Vf cx exp(-ix(y, f?) r{f)Vf, (6) 

with a constant of proportionality that depends on y, but not on /, and can thus be fixed 
if necessary through the normalization condition J V{f\y)T>f = 1. 

2.2 Weights for a given x 

This is all fine so far as it goes, but is not sufficient to give us a reweighting of our ensemble 
of PDFs equivalent to a refitting. The reason for this is that when we fit PDFs, we do not 
demand that the predictions y[f] coincide with the data points y, but rather that the figure 
of merit x?{y^ f) is optimized. Thus rather than integrating both sides of Eq. ([1]) over the 
small spheres S^, we should integrate over all y subject only to the single constraint that 
X^(y; /) = X^) for some fixed value x- It is convenient to choose as a parameter X; rather 
than x^, because we can interpret x the radial co-ordinate in a system of spherical polar 
co-ordinates in function space, centered at y' = y[f]- 
The left-hand side of Eq. ([1]) thus becomes 

J 5{x-x{y\f))V{f\y')VfV{y')dS ^V{f\x)Vf , (7) 

thus defining V{f\x) up to an overall constant (independent of /). We can evaluate it by 
performing the same integration over the right-hand side of Eq. ([1]), since the dependence 
on V{f) factorises: 

/ Kx - x{y. miy'lfKy'vif) vf = 2'-^/\rin/2)r'n^x''-'e--2^' v{f)vf , (8) 

where we have used Eq. @ for the likelihood, and performed the integral over y' in 
spherical co-ordinates. Comparing Eq. ^ and Eq. ([8]) we thus find 

Vif\x)'Dfcxx^^-'^e^'2^"rif)Vf. (9) 

In order to define the weight to be associated to each replica, we need to define the 
probability for each replica by integrating the probability density over a finite volume, and 
then send that volume to zero. For a given replica fk we thus integrate x' over the region 
Xk <x' <Xk + e, where Xk = x{y, fk)- 

/ clx'Vihlx') = eVihlXk) . (10) 

Note that this corresponds to integrating Eq. ([7|) over a spherical shell, centered on y[fk], 
of radius Xk and thickness e. The thickness of the shell is independent of the choice of 
replica: if it were not, we would bias the result. 
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It is easy to see using Eq. Q that Eq. ([10]) gives the formula derived in Ref. [H] for 
the weights: since the rephcas in the prior distribution all have equal probability, V{fk) 
is independent of the choice of replica fk, and the weights are 

Wk^V{h\Xk)^xt^e--2^' . (11) 

The constant of proportionality may be fixed by normalizing the sum of the weights to 
the number of replicas. 

The factor of Xfc~^ takes account of the fact that when there are many data points, 
larger values of Xk have a larger phase space available to them, while very small values 
are phase space suppressed: however good the model it is always very unlikely that the 
theoretical prediction will give exactly the right result for a large number of measurements. 
This is not a trivial result: it depends critically on choosing the correct volume upon which 
to integrate in the space of the new data y. Starting from the same probability density, 
but using a different integration volume would produce a different result. Hence we need 
to justify our particular choice of volume. 

In this respect, we note that our choice includes all points in the space of y with a par- 
ticular x^, and that the thickness of the shell is independent of its radius x{Vi f) or centre 
y[f], in the same way that in Eq. ([2]) the radius of the little sphere was also independent 
of y[f]- The ultimate justification in both cases is that the probability measure d'^y on 
the space y is uniform, i.e. that equal volumes have equal probability: this assumption 
is of course implicit from the start, since without it the likelihood Eq. ^ would not be 
Gaussian. 

Note that although the above argument is most naturally expressed using x as a co- 
ordinate in function space we would get the same weights Wk if we were to instead use x^ , 
or indeed a conditional dependence on any other monotonic function of x, so long as we 
use the same volume in the space of data to define the weights. To see this, note that for 
example 

vif\x')vf^ 1 5{x^ -x\y',f))vUW)^fny')dS . (12) 

so that, comparing with Eq. ([7]), 

V{f\x^)=V{f\x)/{2x)- (13) 

As expected, we thus have V{f\x)dx = 'P{f\x^)dx? ■ If we work with V{f\x^), in order to 
be sure to use the same volume in the space of data (i.e. a spherical shell of thickness e) 
we must now integrate over the interval x^ < (x')^ < Xfc + "^Xk^'- 

Wk^ / dx'^V{fk\x'^) , (14) 

which then yields exactly the same weight Eq. (|11|) as obtained using Eq. (|10|) . 
2.3 Multiple experiments 

Let us now discuss the implications of the above prescription for reweighting with more 
than one set of data. Suppose we are given a set of new data {y}, which is made of 
two independent subsets {yi} and {y2}, containing respectively ni and 71-2 data points. 



7 



such as for example a dataset which includes results from two independent experimental 
measurements (of the same, or of different observables). 

When the two sets of data are used for reweighting simultaneously, the only quantity 
that matters is the total of the two experiments. Since we assumed the experiments to 
be independent, = Xi+xh where Xi = xiUii f )i and the probability density is therefore 
given by Eq. Q above: 

V{f\x) oc ixl + xi)^("^+"^-^)e-5(>^?+>^^) . (15) 

Clearly the individual values of of the two sets need not each be fixed to xf and 
X2- Hence even though the likelihood factorizes, 

r{yiy2\f)=V{y2\f)r{yi\f), (16) 

the weights do not: 

'P{f\x)7^nf\X2)'P{f\xi). (17) 
Instead they are determined through the more complicated relation (see Eqs. (jT]) and ([8])) 

viflx)^ I Kx-{xl + xlf'^)ny'2\m'y'2ny'i\f)d'''y'i- (is) 

With Gaussian likelihoods Eq. (|3|), the integrals can be evaluated to give Eq. ([15]). 

This means that if we wish to proceed sequentially, then after weighting with the first 
data set, with the usual weig hts xr"^exp(-ixi), 

the weights for the second data set are 

not given by 

^2fc oc X2r^ exp(-ix2fe)) (19) 

but rather by 

W2\ik OC (x?. + xL)^"^+"^~'^/'xrr^'exp(-ixi.). (20) 

This perhaps appears odd at first sight, but is as it should be: the first dataset has altered 
the probability distribution of the PDFs, and thus the probabilities of the replicas before 
the second dataset can be considered must necessarily change. This is taken into account 
of by the dividing out the phase space factor of the first dataset, and multiplying by that 
of the combined dataset. 

Nevertheless, it is possible to factorize the reweightings due to more than one dataset, 
if rather than attempting successive reweightings of the same set of replicas, one first 
turns the original weighted set into an unweighted set, and then computes the second set 
of weights using this set. This procedure will be discussed in detail in Sec. |H however 
before we can do this we must first develop a procedure for unweighting. 
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3 Unweighting 



In this section we present a method to unweight reweighted PDF sets so that they can 
be used without the need for including weights for individual replicas. The starting point 
is a set of N^-ep reweighted replicas. Each replica, identified by the index k = 1, . . . , N^ep, 
carries a weight Wk defined in Eq. ()lip . determined by comparing each of the replicas of 
the original unweighted distribution to the new experimental information. Our goal is 
to unweight this PDF set in order to obtain a new set of A^^'gp replicas with all weights 
equal to unity, but with the same probability distribution of the original weighted set, i.e. 
such that any moment of the probability distribution computed from the weighted and 
unweighted set would be the same in the limit in which A'^ep ~^ 
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Figure 1: Graphical representation of the construction of a set of A^^'^p unweighted replicas from a 
set of iVrep = 20 weighted ones. Each segment is in one-to-one correspondence to a replica, and its 
length is proportional to the weight of the replica. The cases of iV^'^^p ^ N,:cp (top) and N[.gp = 10 
(bottom) are shown. 
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3.1 The unweighting method 

The basic idea for constructing the unweighted set consists of selecting rephcas from the 
weighted set of A'rcp rephcas in such a way that rephcas carrying a relatively high weight 
are chosen repeatedly, while those with vanishingly small weight disappear from the final 
unweighted set. The method is depicted graphically in Fig. [TJ We start by subdividing 
a line of unit length into A'rep segments, in such a way that for each replica the length 
of the corresponding segment is proportional to the weight of the replica, and thus to 
its probability. The ordering of the segments is random. In order to extract a set of 
A^4p replicas that faithfully represents this distribution, we draw another unit interval 
directly below the first, and subdivide it into iV^'gp segments all of equal length l/N^.^^. We 
then select replicas from the original weighted set by taking a number of copies of each 
replica equal to the number of lower segments whose right edge is contained in the upper 
segment corresponding to that specific replica. A little thought shows that the (all equally 
probable) A'^'ep replicas in the lower set are then chosen according to the probabilities of 
the A'rep replicas in the upper set. 

To see this, note that, if the number of A^/gp replicas is large enough, (top plot in 
Fig. [1]) then at least one lower segment (width 1/A^gp) will be contained in each upper 
segment, and the original probability distribution is reproduced. This case is however 
unrealistic, as it would require A^'gp to be as large as the ratio between the highest and 
lowest weight, which can be very large indeed. It is also unnecessary, because the amount 
of information carried by the weighted set is measured by its Shannon entropy, which can 
be used to determine the effective number of unweighted replicas N^s which carry the same 
information [15]. Hence, it is pointless to include a number of replicas A^gp significantly 
larger than Agffj as no information is then gained. Because by construction Agfj < Aj-ep 
the more realistic situation is depicted in the bottom plot of Fig. [TJ for the larger weights 
several unweighted segments are contained in a weighted one, but for the smaller weights 
there are often none at all, since we only select a replica if the edge of a lower segment 
is contained in the upper segment corresponding to that replica. Which replica is chosen 
among many all with equally small weight is of course entirely random, since the ordering 
of the replicas is random. 

We can now formulate the unweighting algorithm quantitatively. We start with a set 
of Arep replicas, each carrying a weight Wk Eq. [TTl as in Ref. [T^, we normalize the weights 
according to 



E 



Wk = Arep. (21) 
fc=l 

The probability of each replica is determined given its weight as 

Pk = ir^- (22) 

''rep 

We then define probability cumulants 

k 

Pk = Pk-i+Pk = Y.Pj , (23) 
i=o 

where in the last step we take Pq = 0. By construction, < < 1 and Pk~i < Pk- 
Indeed, the cumulants provide the co-ordinate of the edge of the k-th upper segment in 
the plot of Fig. [H with origin at the left edge of the unit interval. 
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The unweighted set is then constructed as follows. We start with N,-ep weights w^, and 
we determine A'rep new weights 

N' 

rep 

-^ = E<]^-^-i)K^^-]^)- (24) 

j—l ^'rcp -"rep 

The weights tf^ are either zero or positive integers, and they satisfy the normalization 
condition 

<p = E^^ (25) 

k=l 

in fact, they correspond to the graphical counting procedure described previously. The 
unweighted set is then simply constructed by taking w'f^ copies of the k-th replica, for all 
k = 1, . . . , iVrep- The probability of replica k in the new unweighted set is then given by 

Pi = (26) 

"^rep 



As a consequence we have 



lim p'k = pk, (27) 



i.e. the unweighted set reproduces the probabilities of the weighted set in the limit of large 
sample size, as it ought to. 

As already mentioned, even though exact identity of the reweighted and unweighted 
probability distribution holds in the limit Eq. (j27p . the amount of information contained 
in the weighted set corresponds to iVeg < iVj-ep unweighted replicas, with A'^eff determined 
as in Eq. (10) of Ref. [Il] from the Shannon entropy. Therefore for practical applications 
it is advisable to take A^/gp < -^eff — though there is nothing in principle wrong with 
taking A'^'gp > Aefj, this would just lead to a highly redundant replica set. We will study 
the dependence of unweighted results on A'^'^p in an explicit example below. 
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Figure 2: Distance between central values (left) and uncertainties (right) of the reweighted and 
unweighted PDFs determined from iV^p = 1000 replicas of NNPDF2.0 DIS+DY reweighted with 
Tevatron jet data, as described in the text. The corresponding distances between refitted and 
reweighted PDFs were shown in Fig. 2 of Ref. |14) . 




Figure 3: Left: the relative Shannon entropy HniN^.^^) Eq. (^5)) as function of N[.^^^ for the 
reweighted and unweighted PDFs described in the caption of Fig. [51 Right: the effective number 
of replicas of the unweighted set N^f^ as function of N^.^^. The dashed vertical line denotes the 
value iV/pp — Ncff. In all plots a moving average of 25 replicas has been performed to smooth out 
random fluctuations. 



3.2 Testing unweighting 

As a proof of concept of the unweighting technique, we will apply it to the two cases 
discussed in Ref. [2]: the reweighting of NNPDF2.0 DIS+DY with Tevatron inclusive 
jet data and the reweighting of NNPDF2.0 with the DO muon and inclusive electron W 
lepton asymmetry data. 

First, we consider the reweighting of NNPDF2.0 DIS+DY [7j with the Tevatron inclu- 
sive jet data As discussed in Ref. [H], starting with A^^ep = 1000 NNPDF2.0 
DIS+DY replicas, after reweighting with jet data the effective number of replicas is 
Nf.fl = 334. A reasonable choice for the size of the unweighted set would be any number 
less than this: here we chose A^gp — 100. We perform the unweighting following the proce- 
dure discussed above. The comparison between the reweighted PDFs and the unweighted 
set can be made quantitative by determining the distances between PDFs and uncertain- 
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Figure 4: Distance between central values (left) and uncertainties (right) of the reweighted and 
unweighted PDFs determined from iV^p = 1000 replicas of NNPDF2.0 DIS+DY reweighted with 
DO ly-lepton asymmetry data, as described in the text. 

ties. Distances were defined in Appendix A of Ref. [7j, and in Ref. [14J in the weighted 
case; recall that distances d ~ 1 correspond to statistically identical distributions, while 
(with A'rep = 100 replicas) d ^ 7 corresponds to distributions which are statistically in- 
equivalent, but agree to one sigma. The distances between the reweighted PDF set and 
the same PDF set after unweighting are shown in Fig. [2j The corresponding distances 
between reweighted and refitted PDFs were shown in Fig. 2 of Ref. [13] • It is clear that 
the distances between reweighted and unweighted sets are generally smaller than those 
between the reweighted and the refitted sets, and they all fluctuate about d ~ 1, showing 
statistical equivalence (with the possible exception of the light sea asymmetry at small x, 
which is subject to very large uncertainties). We conclude that there is no significant loss 
of accuracy in the reweighting due to the unweighting. 

We can now study the information contained in the unweighted set as the number of 
unweighted replicas A^^'gp is varied. To this purpose, we compute the relative Shannon 
entropy between the unweighted set and the original weighted set, defined as 

^ rep / 

HniK,^) = Y.Pk^^-^ (28) 

k=i 

where p'^ are the probabilities Eq. ([26]) . defined for each value of N'^^^. If the starting 
number of replicas A^j-ep is large enough that N'^^^ ~ Arep is already in the asymptotic region 
where Eq. (p^ holds, then clearly for large A^^'gp ~ A^rep the relative entropy ff/j(A^4p) 
should fall to zero. For lower values of A'^'gp iJij(A'^'gp) measures the information loss 
between the original weighted set and the unweighted one. 

In Fig. Owe display f/'ij(A'4p)- It is clear that i?i^(A"4p) falls linearly as a function 
of A^cp up to Agff, as more and more of the information in the weighted set is included. 
Around A^^gp ~ Agg the slope of the fall changes abruptly, and i/^(A^gp) then falls slowly 
to zero as A^j-gp increases, being already close to zero when AT^gp ~ A'gg. This can also be 
seen by computing directly the effective number of replicas A'g^ of the unweighted set as 
a function of Aj!!gp, which can be determined using Eq. (10) of Ref. [H], with the weights 
w'^ Eq. (|24p and A^ = A^rep- Note that the result is nontrivial because some of the w'^^ are 
zero, others are integers larger than one, and the dependence on A^^'gp comes about only 
through the definition of the weights Eq. ()24p . The result is also shown in Fig. [3j at first 
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N^g grows linearly as a function of A'^cpi ^^^t very nearly equal to it. However 

when it reaches A'^ep ~ ^effi the linear growth breaks off abruptly, and saturates at the 
value N^g = N^s, which is reached asymptotically. Hence our expectation is borne out by 
these plots: the amount of information in the unweighted set increases with the number 
of unweighted replicas A^'^'ep, but only up to the point N^.^^ ^ N^s, after which nothing is 
gained by further increasing N^.^^. 

We now repeat the same analysis for the unweighting of the NNPDF2.0 set, reweighted 
with the inclusive electron and muon DO Run-H W lepton asymmetry data |20y21j. The 
reweighting procedure for these data was presented in detail in Ref. [H]. The effective 
number of replicas, after reweighting a starting set of N^s = 1000 replicas, is in this case 
A^eff = 356. Again, we can choose the size of the unweighted set to be A^'^p = 100, as in 
the case above, and we perform the unweighting following the same procedure as before. 

In Fig. m we show the distance between the reweighted and unweighted sets, and 
in Fig. [5] we plot the relative entropy between these two sets and the effective number 
of replicas in the unweighted set as a function of the number of unweighted replicas. 
The conclusions are the same as before: the unweighted set is indistinguishable from the 
reweighted one, provided that the number of unweighted replicas A'^ep is of the same order 
as the effective number of reweighted replicas A'cff. In the sequel we will thus feel free 
to use unweighted replica sets instead of their weighted counterparts, to which they are 
essentially equivalent. 
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4 Consistency 



4.1 Multiple Reweighting 

As we discussed in Sec. 12.31 when adding two new datasets to a set of prior PDFs, one 
way to proceed is to treat them as a single combined dataset, as in Eq. (|15|) . i.e., with 
weights x^^~^^ exp(— with = Xi + xi n = ni + n2. However, it should also 
be possible to treat them separately, weighting with first one dataset, then the other. If 
we do this using Eq. ([20]) then by construction we get the same answer that we would get 
by including the two sets at once, but this is trivial, because in the weights Eq. (pOj) the 
effect of the first weighting is divided out. 

However, we can test non-trivially that two subsequent weightings by two independent 
datasets commute by incorporating the unweighting procedure. Formally we define the op- 
eration R as reweighting with the weights given by Eq. (jlip . and an unweighting operation 
tl, as described in Sec. 13.11 Note that because the unweighting operator is a projection 
operator, it has no inverse. Weighting an existing PDF set by incorporating information 
from a new dataset then consists of the combined 'weighting' operation W = UR. The 
weighting operation takes a set of replicas {fk}, all equally probable, and replaces it with 
a subset which are again all equally probable, but the selection of which reflects informa- 
tion contained in the new dataset that was used in the reweighting R. Clearly W has no 
inverse, since it projects onto a lower dimensional space. 

Now consider two datasets: the set of replicas produced by the action of weighting with 
the first dataset, Wi, can be subject to a further weighting with the second dataset W2- 
Now of course the formula used to evaluate the weights used for the second reweighting 
must again be given by Eq. ([TT]) : the subset of replicas produced by Wi are again all 
equally probable, so the second reweighting must work in precisely the same way as the 
first. The only difference is that W2 acts only on those replicas produced by the action of 
Wi. 

Now for consistency it cannot matter in what order we perform these two weightings, 
and indeed their combined effect must be the same as for a single weighting W12 , which 
treats the two datasets as a single dataset: 1^12 = W2W1 = W1W2, or more explicitly 

UR12 = UR2UR1 = URiUR2- (29) 

So, for weighting to be consistent it must satisfy two nontrivial conditions: the combination 
property, and the commutation property. Clearly the first always implies the second (if 
W1W2 = W12, clearly W2W1 = W1W2, because R12 is performed using weights determined 
through the total x^ = X1+X2)' but not the reverse (we might have W2W1 = W1W2 / W12 
if the formula Eq. (jlip was incorrect). 

In the remaining part of this Section we present two tests of the combination and 
commutation properties when two datasets are included. First, we consider sets of data 
for the same observable (the one-jet inclusive cross-section) in the same kinematic region 
by two different experiments. Then, we consider data for two different observables (a 
jet cross-section and a Drell-Yan cross section) which affect different PDFs in different 
kinematic regions. 
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CDF 


DO 


CDF+DO 


Data points 


76 


110 


186 


iVcff 


290.8 


565.8 


334.5 



Table 1: Datasets used in the Tevatron Run II inclusive jet reweighting exercise. For each 
set the number of data points and the effective number of replicas of the reweighted set 
of A'rep = 1000 replicas are given. 



CDF/DO Inclusive Jets - xg(x,Q^) CDF/DO inclusive jets - xg{x,Q^), ratio to NNPDF2.0 DIS+DY 




Figure 6: Comparison of the large-x gluon PDF for prior set, reweighted sets with different 
successive reweighting orders and refitted set, when the jet data of Tabled] are included in 
the NNPDF2.0 NLO DIS+DY fit. Results are shown at = 2 GeV^, both in absolute 
scale (left) and as a ratio to the prior (right). 

4.2 Tevatron Inclusive Jets 

The first exercise we present is an extension of the reweighting proof-of-concept in Section 4 
of lT^. There, Run II Tevatron inclusive jet data production were included by reweighting 
a PDF set extracted from a NLO fit to DIS and Drell-Yan data (NNPDF2.0 DIS+DY) and 
the results compared to those obtained from a fit which included the same DIS, Drell-Yan 
and inclusive jet datasets all treated in the same way (NNPDF2.0). 

In this Section we look again at the inclusion via reweighting of the same datasets, 
namely the CDF Run II-A;^ and DO Run Il-cone inclusive jet data in the NNPDF2.0 
DIS+DY fit, but we now focus on comparing the results obtained in the following two 
cases: 

(a) the two new datasets are included by reweighting the prior fit in a single step with 
both datasets; 

(b) one of the datasets is included by reweighting, an unweighted set of PDFs is con- 
structed using the procedure detailed in Section [3l and finally the latter set is 
reweighted again with the second dataset. 

We will carry out the successive reweighting procedure (b) twice, exchanging the order 
in which the CDF and DO datasets are included, in order to test the commutativity of 
the procedure. A final unweighting is performed for all the reweighted sets and the PDF 
comparisons and computations of distances are performed using these unweighted sets. 

The number of data points and the effective number of replicas N^s after reweighting 
with these data of a set of A^rep = 1000 replicas are summarized in Table [TJ In each 
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Distance between central values 



e-05 0.0001 0.001 0.01 0.1 



Distance between PDF uncertainties 



1e-05 0.0001 



Figure 7: Distances between central values (left) and uncertainties (right) of PDFs from 
reweighting with the combined CDF+DO dataset and PDFs from reweighting first with 
CDF data and then with DO data. 



Distance between central values 



Distance between PDF uncertainties 



1e-05 0.0001 



Figure 8: Distances between central values (left) and uncertainties (right) of PDFs ob- 
tained by reweighting with CDF and DO jet data included in either order. 





(CDF+DO) 


E605 


(CDF+D0)+E605 


Data points 


186 


119 


305 




627.1 


59.5 


63.7 



Table 2: As Tab. [H but now for the E605 and inclusive jet reweighting exercise. 



case, we construct a final set of A^^'ep — 100 unweighted replicas. When the reweighting is 
performed in two steps, we first construct a (redundant) set of 1000 unweighted replicas, 
which is then reweighted and unweighted again to obtain the final set of 100 unweighted 
replicas. 

As discussed in Refs. [71[ll], Tevatron jet data mostly affect the gluon at large x, 
leaving all other PDFs essentially unchanged. The impact of the inclusion of these data 
in the fit is shown in Fig. [6] where we compare the gluon for the prior set, the refitted one, 
and sets obtained reweighting the prior in the three different ways described above. As in 
the previous Section, a more quantitative assessment can be made by computing distances 
between various pairs of PDF sets. In Fig. [7]we show the distance between PDFs obtained 
by reweighting with the two sets at once and those found including CDF data first and 
DO data next, while in Fig. [8] we show distances between sets obtained by including the 
CDF and DO data in either order. It is clear that the three reweighting procedures lead 
to completely equivalent results. 
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E605/lnclusive Jets - xg(x,Q^)~| E605/lnclusive Jels - xg(x,Qg), ratio to HNPDF2.1 PIS 




Figure 9: Comparison of the large-x gluon and quark valence PDFs for prior set and 
reweighted sets with different successive reweighting orders, when the jet and Dreh-Yan 
data of Table El are included in the NNPDF2.1 NLO DIS fit. Results are shown at = 
2 GeV^, both in absolute scale (left) and as a ratio to the prior (right). 

4.3 Jet and Drell-Yan data 

In this second exercise we start from a NLO fit to DIS data, NNPDF2.1 NLO DIS [8j, and 
include the Tevatron inclusive jet data discussed in the previous section (DO and CDF as 
a single dataset) and data from one of the Drell-Yan experiments which are included in 
the NNPDF2.1 global analysis (the E605 fixed target experiment [17]). 

The number of data points and the effective number of replicas N^^g in this case are 
summarized in Table [2j Also in this case, we construct a set of iV^ep — unweighted 
replicas, with N^^p — 1000 unweighted replicas in the intermediate step if any. Note that 
this is a much less symmetric example than the previous one: the Drell-Yan data have a 
much greater impact than the jet data (in fact for the Drell-Yan data Nl.^^ > N^g). 

As already mentioned, the jet data affect mostly the large x gluon, while the Drell-Yan 
data have mostly an impact on the quark fiavour and antiflavour separation. The impact 
of these data on the gluon and the total quark valence distribution are shown in Fig. [9l 
where we show the results obtained by reweighting with the two sets included together, 
or one after another in either order. Note that in this case we do not have a refitted set. 
Distances between PDFs obtained by reweighting in the combined set, or first with jets 
then with Drell-Yan are shown in Fig. [TUl Distances between PDFs obtained reweighting 
in either order are shown in Fig. [TTJ The test is clearly as successful here as it was in the 
previous case, despite being perhaps more challenging. 
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Distance between central values 



Distance between PDF uncertainties 



Figure 10: Distances between central values (left) and uncertainties (right) of PDFs from 
reweighting with the combined jet+Drell-Yan dataset and PDFs from reweighting first 
with jet data and then with Dreh-Yan data. 



Distance between central values 



Distance between PDF uncertainties 



Figure 11: Distances between central values (left) and uncertainties (right) of PDFs ob- 
tained reweighting jet data and Drell-Yan data included in either order. 
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5 The W asymmetry at the LHC 



In this section we will use the reweighting technique presented here and in Ref. [TJj to 
study the effect of including in the NNPDF2.1 NLO global fit the W lepton asymmetry 
measurements produced by the experimental collaborations at the LHC, and based on 
data collected in the 2010 run. 

The W leptonic charge asymmetry is defined in terms of the — > l^ui differential 
cross-sections dai±/dr]i, with rji being the pseudorapidity of the lepton coming from the 
decay of the W boson, as 

^ dai+/dr]i -dai-/dr]i 
^ dai+ 1 dr\i + doi- / drji 

where the cross-sections are computed inside the acceptance cuts used to select the W — )• 
li'i events. 

The ATLAS Collaboration published a first measurement of the muon charge asym- 
metry from W boson production in the pseudorapidity range |r/| < 2.4, based on 31pb~^ 
of accumulated luminosity [22] , while CMS published a measurement of the muon and the 
electron charge asymmetries in the pseudorapidity range |r/| < 2.2, based on 36pb~^ of 
data [22. The data provide a constraint for the above combination of PDFs in the region 
10~^ <x< 10^^, where they are only partially constrained by the data already included 
in the NNPDF global analysis. In particular, while u is very well determined by fixed 
target DIS data, d and the light sea (d — u) are currently much less constrained. 

The LHCb collaboration presented preliminary results for a measurement of the muon 
charge asymmetry in the pseudorapidity range 2 < < 4.5, covered by the LHCb 
detector. This measurement probes PDFs in the small and large x regions, where data 
included so far in the global analyses provide much looser constraints. For this reason 
they might eventually have a substantially larger impact on global fits than the ATLAS 
or CMS data. However, at the time of writing these experimental results have only been 
presented in preliminary form [23], and are therefore not included in this study. 



5.1 Inclusion of individual experiments 

We begin by checking the compatibility of the individual ATLAS and CMS datasets for 
the charge lepton asymmetry with the data included in the NNPDF2.1 global fit, and by 
studying their impact when they are included separately in the fit using the reweighting 
technique presented in this paper. 

The ATLAS muon charge asymmetry data [22] and CMS electron and muon data [23j 
are compared to the predictions obtained using three different NLO global fits, CTIO [25], 
MSTW2008 [261 and NNPDF2.1 in Fig. [O The theoretical predictions including NLO 
QCD corrections are obtained using the fully differential Monte Carlo code DYNNLO [22] 
which allows for the implementation of arbitrary experimental cuts. 

To give a more quantitative estimate of the level of agreement of the different pre- 
dictions with the experimental data, in Table [3] we collect the P^r number of data 
points for each individual dataset. Since no covariance matrix is provided by the LHC 
experiments at this point, we add statistical and systematic uncertainties in quadrature 
in the computation of the values. 
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Figure 12: Predictions for the W lepton asymmetry at NLO, obtained with DYNNLO [27] 
using the CTIO, MSTW08 and NNPDF2.1 parton sets, compared to measurements for 
the muon charge asymmetry from ATLAS [22] (left plot), and the electron (centre plot) 
and muon (right plot) charge asymmetries from CMS |23j . 





iVdat 


NNPDF2.1 


CTIO 


MSTW08 


ATLAS(31pb-i) 


11 


0.76 


0.77 


3.32 


CMS(36pb-i) electron pr > 25 GeV 


6 


1.83 


1.19 


1.70 


CMS(36pb-i) muon pT > 25 GeV 


6 


1.24 


0.73 


0.77 



Table 3: Values of x^/^dat for the ATLAS and CMS lepton charge asymmetry data 
for different PDFs sets. Theory predictions are computed at NLO accuracy using the 
DYNNLO code. Note that in Ref. [22] a somewhat lower value is quoted for MSTW08, 
due to the use of the MC@NLO code. 

The ATLAS muon charge asymmetry data are already very well described by the 
NNPDF2.1 prediction before being included in the analysis. This is shown by the excellent 
X^/^dat = 0.76 reported in Table [3] and demonstrated by the distribution of foi' the 
individual replicas before reweighting shown in the left plot of Fig. [T3l which has a sharp 
peak around one. The compatibility of a new dataset with the data already included in a 
global analysis can be assessed by looking at the probability density for the parameter a, 
V{a) defined in Eq. (12) of [11]. If this probability distribution peaks close to one, the new 
data are consistent with the ones already included in the global fit. For the ATLAS data, 
the P{a) distribution, shown in the right plot of Fig. [T3l is peaked slightly below one, 
thereby showing the good compatibility of these data with those included in the global 
analysis. Note that optimal values of x^/^dat are to be expected because statistical and 
systematic errors have been added in quadrature, thereby leading to an overestimation of 
uncertainties. 

After reweighting NNPDF2.1 with the ATLAS data the quality of their description 
remains substantially unchanged, with the value Xrw/-^dat = 0.72. The number of effective 
replicas of the reweighted sets computed according to Eq. (42) in Appendix of |14] is 
A'eff = 928, out of the initial number of Nj-^p = 1000 replicas in the prior. The distribution 
of the x^/-^dat for the weighted replicas, shown in the center plot of Fig. [131 peaks just 
below one, again confirming the very good description of these data also after reweighting. 

Given the outcome of the previous statistical analysis - a very good description of 
the data by the prior set to start with, resulting in a large number of surviving replicas 
(iVeff = 928) - it is easy to predict that the ATLAS data alone will impose only mild 
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1.5 2 2.5 3 3.S 4 
Chi2 distribulion 



Chi2 weigthed distribution 



Figure 13: Distribution of x^/-^dat for individual replicas prior (left) and after (middle) 
reweighting and 'P(a) distribution (right) for the ATLAS muon charge asymmetry data. 
In the left plot the shaded region corresponds to the central 68% of the distribution. 



= H4, r-atio to NNPDF2.1 | Q' = H^, ratio to NNPDF2.1 | 




X 



Figure 14: Comparison of light quark and antiquark distributions at the scale = 
from the global NNPDF2.1 NLO global fit and the same distributions obtained after 
adding ATLAS muon charge asymmetry data via reweighting. Parton densities are plotted 
normalized to the NNPDF2.1 central value. 

constraints on the underlying PDFs. This is in fact what is seen in Fig. [T3] where we 
compare the NNPDF2.1 light (anti) flavour densities at the scale = to the ones 
obtained after reweighting with the ATLAS data. The most noticeable effect is a reduction 
of the uncertainties on these PDFs in the medium-small x region, around x ^ 10^^, by 
up to 20%. 

We now turn to the CMS measurements described in [23j. CMS presented data for 
both the electron and muon charge asymmetries from W decays with two different cuts on 
the transverse momentum of the detected lepton: p± > 25 GeV and p± > 30 GeV. From 
the values for x^/^dat obtained using the NNPDF2.1 global set reported in Tabled and 
the plots of the distribution of x^/-^dat for individual replicas and of the V{a) distribution 
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I Distribution of ehl2-new;Wdat | | Weighted distribution of clii2-new/Ndat | | P(a) | 




Chi2 distribution Chl2 welgthed distribution 



Figure 15: Same as Fig. [H] for the CMS(pT > 25 GeV) (top) and CMS(pr > 30 GeV) 
(bottom) lepton charge asymmetry data. 



= l<, ratio to NNPDF2T] = It^, ratio to NNPDF2T| 




Figure 16: Same as Fig. [2] but after adding CMS lepton charge asymmetry data. 

shown in Fig. [TCI we see that both sets are equally well described by the NNPDF2.1 set 
and thus compatible with the data included in the global analysis. Since the two datasets 
are not independent we have to choose which one to use in our reweighting analysis and 
thus we only consider the dataset with the looser cut px > 25 GeV, which proves to be 
more constraining of the PDFs. We perform our reweighting analysis including the muon 
and electron data as a single dataset. 

The NNPDF2.1 prediction provides a good, though not optimal, description of the 
CMS data, as shown by the x^/^da.t = 1-51 obtained combining the values for the elec- 
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tron and muon data collected in Table [3l After reweighting, the description of these data 
improves significantly with Xrw/-^dat = 0.77. The number of effective replicas computed 
as above is roughly half the initial number of replicas, A'efr = 531 out of A^rep = 1000, sug- 
gesting that these data will have have a significant impact on the PDFs. The distribution 
of the x^/-^dat of individual replicas after reweighting is centered around one, as shown in 
the middle-upper plot of Fig. [151 

The impact of the CMS data on light (anti)flavour PDFs, is shown in Fig. 1161 where we 
observe a reduction of uncertainties in the medium x region smaller than that due to the 
ATLAS data, but also a change in the shape of the u and d distributions at relatively large 
X ~ 0.1, pushing up the central value a little and reducing the uncertainties by around 
10% for the down distributions and as much as 25% for the up. 

We conclude this Section by comparing the predictions for the charge asymmetry 
computed with NNPDF2.1 and NNPDF2.1 after reweighting with the ATLAS and CMS 
data respectively in Fig. [T71 The effect on the prediction for the CMS data is more 
substantial, because the data undershoot the NNPDF2.1 NLO prediction in most of the 
higher rapidity bins. 




Figure 17: Comparison of the lepton charge asymmetry from W boson production com- 
puted with the NNPDF2.1 NLO PDF set and sets where ATLAS (left) and CMS (right) 
lepton charge asymmetry data have been included using reweighting. 

5.2 Combination of ATLAS and CMS data 

We now consider adding the ATLAS and CMS lepton charge asymmetry data as a single 
dataset to the NNPDF2.1 NLO global fit using reweighting. 

The whole dataset is already well described by the NNPDF2.1 NLO dataset with 
X^/-/Vdat = 1-17 and the distributions of x^/-^dat for individual replicas having a sharp peak 
around one, as shown by the left plot in Fig. [THl The compatibility of the ATLAS+CMS 
data with the data included in the global analysis and among the two experiments is also 
good, as can be deduced by looking at the V{a) distribution shown in the right plot in 
Fig. [T8| which is nicely peaked around one. 

After reweighting the description of the data improves, with x?w/-^dat = 0.95 with the 
distribution of Xrw/-^dat for individual replicas shown in the middle plot of Fig. [18] showing 
a sharp peak around one. These results, combined with the number of effective replicas 
surviving after reweighting, namely A'efr = 619 out of the initial A^rep = 1000, show that 
the use of the ATLAS and CMS data together in the fit is not only possible but imposes 
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Figure 18: Same as Fig. [13] for the combined ATLAS+CMS {px > 25 GeV) lepton charge 
asymmetry data. 



= H4, r-atio to NNPDF2TI Q' = H4, ratio to NNPDF2.1 | 




Figure 19: Same as Fig. [T3] but after adding both the ATLAS and CMS lepton charge 
asymmetry data. 

a moderate constraint on PDFs. However the constraint is not quite so great as with the 
CMS data alone, suggesting a mild incompatibility particularly in the high rapidity bins. 

The impact of the data on the light flavour and anti-flavour distributions is shown in 
Fig. \19\ where we compare the u and d quark and antiquark distributions at the scale 
q2 = ]y^2^ from the NNPDF2.1 global fit and the ones obtained after adding the ATLAS 
and CMS lepton charge asymmetry data using reweighting. There is around 20% reduction 
in uncertainties around x ~ 10~^, mainly due to the ATLAS data, complemented by a 
reduction of between 10% and 25% at larger x, mainly due to the CMS data. 
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6 Global PDFs including LHC data 



In this section we will check the consistency of the DO and ATLAS+CMS datasets among 
themselves, and use both datasets to reweight the NNPDF2.1 NLO PDFs. The unweight- 
ing method presented in Sect. [3] is then used to produce a set of 100 unweighted replicas. 
The final product of this analysis is a new set of NNPDF parton distribution functions, 
NNPDF2.2 NLO, which includes, together with all the datasets already included in the 
NNPDF2.1 NLO global set, the DO, ATLAS and CMS lepton charge asymmetry data 
described above. 

6.1 Tevatron W asymmetry data 

In Ref. [14j we used the reweighting technique to study the compatibility of the DO W 
lepton charge asymmetry data with the data included in the NNPDF2.0 NLO global fit 
and to assess their impact on the fitted parton densities. The conclusion of this study was 
that the data that are inclusive in the p± of the identified lepton, namely the muon charge 
asymmetry data presented in ^1] and electron charge asymmetry data with pj_ > 25 GeV 
released in [20], are consistent with each other and with all the other datasets included in 
NNPDF2.0, in particular with the CDF W asymmetry data [2H] and the fixed-target DIS 
deuteron data. When included in the fit they have a moderate impact on PDFs, providing 
a reduction of the uncertainty of the valence quark distributions in the medium-high x 
region {x ~ 10~^). 

Less inclusive electron charge asymmetry data were also presented in [20]. They are 
binned in p±, divided into two sets with 25GeV < p± < 35GeV and pj_ > 35GeV re- 
spectively. We observed [Mj that these data, which could have potentially more impact 
on the PDFs, are inconsistent with some of the DIS data included in the global analysis 
and have problems of internal consistency. Similar conclusions have been reported by the 
MSTW [29] and CTEQ [25] collaborations, as they tried to include these datasets in the 
context of a PDF global analysis. We will thus not use these datasets here. 

These results, though obtained using the NNPDF2.0 global fit, remain substantially 
unchanged if we use instead the NNPDF2.1 NLO global set as a prior fit to start the 
reweighting analysis. The muon charge asymmetry [21] and inclusive electron charge 
asymmetry data (with p± > 25 GeV) [20] can thus provide additional information to that 
from the ATLAS and CMS data considered in the previous section. We thus proceed 
directly to a combined fit of these data together with the LHC data. 

6.2 Combining LHC and Tevatron W asymmetry data 

The description of the combined ATLAS, CMS and DO charge asymmetry datasets ob- 
tained using the NNPDF2.1 NLO global fit, in which they were not included, is reasonably 
good but not optimal, with x^/^dat = 2.22: a detailed comparison is shown in Table [H 
The distribution of the combined x^/^dat for individual replicas before and after reweight- 
ing, and the P{a) distribution, shown in Fig. I20[ indicate however that these data are rea- 
sonably compatible with the data already included in the NNPDF2.1 analysis and would 
provide a significant constraint on the PDFs. 

These conclusions are indeed confirmed when the effect of the ATLAS, CMS and DO 
data is included using the reweighting technique. After reweighting their overall descrip- 
tion improves significantly, with a combined Xrw/^dat = 0.81. This is due to a significant 
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Experiment 


A^dat 


NNPDF2.1 


NNPDF2.1 LHC 


NNPDF2.2 


NMC-pd 


132 


0.97 


0.95 


0.97 


NMC 


221 


1.73 


1.72 


1.72 


SLAC 


74 


1.33 


1.26 


1.28 


BCDMS 


581 


1.24 


1.23 


1.23 


HERAI-AV 


592 


1.07 


1.07 


1.07 


CHORUS 


862 


1.15 


1.15 


1.15 


FLH108 


8 


1.37 


1.37 


1.37 


NTVDMN 


79 


0.79 


0.74 


0.70 


ZEUS-H2 


127 


1.29 


1.28 


1.28 


ZEUSF2C 


50 


0.78 


0.79 


0.78 


H1F2C 


38 


1.51 


1.52 


1.51 


DYE605 


119 


0.84 


0.84 


0.86 


DYE886 


199 


1.25 


1.23 


1.27 


CDFWASY 


13 


1.85 


1.81 


1.81 


CDFZRAP 


29 


1.66 


1.61 


1.70 


DOZRAP 


28 


0.60 


0.60 


0.58 


CDFR2KT 


76 


0.98 


0.98 


0.96 


D0R2CON 


110 


0.84 


0.84 


0.83 


ATLASmuASY 


11 


[0.77] 


0.97 


1.07 


CMSeASY 


6 


[1.83] 


1.23 


1.08 


CMSmuASY 


6 


[1.24] 


0.63 


0.56 


DOeASY 


12 


[4.39] 


[3.46] 


1.38 


DOmuASY 


10 


[1.48] 


[1.17] 


0.35 


Total 




1.165 


1.158 


1.157 



Table 4: Table of x^/A'dat values for the experiments included in the NNPDF2.1 NLO fit, 
the NNPDF2.1 LHC fit discussed in Section [5] and the NNPDF2.2 NLO fit. The numbers 
in square brackets correspond to the experiments which are not included in the fit. The 
three fits thus have respectively A^'dat = 3338, 3361 and 3383. 
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Figure 20: Same as Fig. [H for the combined DO+ATLAS+CMS pr > 25 GeV dataset. 



= r4, Ratio to NNPDF2TI Q' = r4, Ratio to NNPDF2T] 




Figure 21: Comparison of light quark and antiquark distributions at the scale = 
from the global NNPDF2.1 and NNPDF2.2 global fits. Parton densities are plotted 
normalized to the NNPDF2.1 central value. 

improvement in the fit to the CMS and the DO data: the fit to the ATLAS data deterio- 
rates a little, again showing that there is some tension. The number of effective replicas 
is now A'efr = 181 out of the initial A'rep = 1000, showing that the W lepton asymmetry 
data indeed introduce very significant constraints on the PDFs. The distribution of the 
X^/-/Vdat for the individual replicas after reweighting, shown in the middle plot of Fig. [20l 
is peaked around one, confirming the compatibility of these data with the other datasets 
included in the global analysis. 

After reweighting, the unweighting procedure of Sec. [3] may be used to give a 100 
replica set of PDFs equivalent to a global fit which includes all the data already included 
in NNPDF2.1, plus the ATLAS, CMS and DO W asymmetry data. We cah this new NLO 
PDF set NNPDF2.2. The quality of the data to all the sets used in this new fit is shown 
in Tab. [H There is no significant deterioration in the i^i of other datasets included 
in the global fit, and the fit to the NuTeV dimuon data improves significantly. The overall 
XtotZ-^dat thus also improves a little. 
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Percentage uncertainty reduction | Percentage uncertainty reduction | 




Figure 22: The percentage change in the uncertainty in the hght quark and antiquark 
distributions at the scale = in the global NNPDF2.1 NLO global fit, after adding 
ATLAS, CMS and DO lepton charge asymmetry data via reweighting. The four curves 
show in each case the effect of ATLAS (red) and CMS (pink) only, together (blue), and 
then together with the DO data (green), i.e. NNPDF2.2. 

The impact on light flavour and anti-flavour PDFs is shown in Fig. [211 where we 
compare the u and d quark and antiquark distributions at the scale = from 
the NNPDF2.1 NLO set to the ones obtained for the NNPDF2.2 NLO set. The most 
noticeable effects of the inclusion of the new data are concentrated in two separate regions 
of X, namely, the x ~ 10~^ region, which is mostly affected by the ATLAS data, and the 
X ~ 10"'^ — 10~^ region, which is mostly affected by the CMS and DO data. In each of 
these regions, the W asymmetry data leads to a reduction of uncertainties on the light 
flavour and antiflavour distribution, or around 20% in the low x region, and up to 30% 
at higher x when CMS and DO are combined (see Fig. [22|) . At higher x changes in the 
central values for these PDFs by up to one sigma are also observed: these are mainly due 
to the DO data (compare Fig. [21] with Fig. \T9\\ . 

As recently shown in the extensive studies carried out in the context of the PDF4LHC 
Working Group [30], there is rather good agreement among NLO parton distributions 
determined from the widest global datasets, specifically by the NNPDF, MSTW and 
CTEQ groups. However, there still are some significant differences, notably in the flavour 
separation at medium-large x. Since this is the region which is directly probed by the 
Tevatron and LHC lepton charge asymmetry data studied here, these data might help in 
resolving some of these outstanding incompatibilities. 

To this end, in Figs. [25] and 1241 we compare the d/u and {d — u) combinations at the 
scale Q2 = M"^ obtained in the NNPDF2.1 and MSTW08 NLO global analyses, which 
do not include any of the W asymmetry data, the CTIO analysis, which includes only 
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the DO data, and the new NNPDF2.2 fit, which also includes the ATLAS and CMS data. 
The new data he in a region of x where the compatibihty between the results obtained by 
different cohaborations is at best marginal: in particular the d/u ratio given by MSTW08 
is too low at large x and too high at medium x. The reduction of uncertainty when going 
from NNPDF2.1 to NNPDF2.2 is quite visible: the NNPDF2.2 prediction should thus be 
taken as the most reliable at present. Future LHC data will constrain the light quark 
PDFs in this region even more. 




Figure 23: Comparison of the d/u ratio at = in NNPDF2.1, CTIO, MSTW08 
and NNPDF2.2. Upper plots show absolute values, while the lower plots show the ratio 
to NNPDF2.1 
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7 Conclusions and outlook 



The reweighting method which we have reviewed, re-derived and refined in this paper is 
a powerful techinque which enables one both to preform interesting studies of the statis- 
tical properties of parton distributions viewed as probability distributions in a space of 
functions, and to rapidly and effectively include new experimental information in parton 
sets. Coupled to the unweighting method that we have presented and tested here it allows 
one to quickly upgrade existing Monte Carlo replica PDF sets to new sets which, while 
retaining the same format, include new experimental information. 

The method has been used here to construct the NNPDF2.2 NLO PDF set — the 
first PDF set to include LHC data. This will doubtless be the first of many such sets: 
the quantity, quality and diversity of LHC measurements potentially relevant for PDF 
determination is now growing at an impressive rate. 



The NNPDF2.2 NLO LO PDF set that has been presented in Section [6] is available 
from the NNPDF web site, 

http : // Sophia . ecm . ub . es/nnpdf 

and will be also available through the LHAPDF interface |31j : 

• NNPDF2.2 NLO, set of iVrop = 100 replicas: 
NNPDF22_nlo_100 . LHgrid 
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